KeyVec: Key-semantics Preserving Document Representations
نویسندگان
چکیده
Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms take as input for a number of NLP tasks. We propose a neural network model, KEYVEC, which learns document representations with the goal of preserving key semantics of the input text. It enables the learned low-dimensional vectors to retain the topics and important information from the documents that will flow to downstream tasks. Our empirical evaluations show the superior quality of KEYVEC representations in two different document understanding tasks.
منابع مشابه
Category Theory as a Foundation for Document Processing
Documents, particularly electronic documents that are created, disseminated , and used with computers, have several representations. Users may wish to work with such electronic documents in any of a document's representations, and this can make it diicult to maintain consistency between the diierent representations of a document. Category theory provides insight into this problem. We begin by d...
متن کاملIntegrating Structure and Meaning: A New Method for Encoding Structure for Text Classification
Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or ‘concepts’. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduce...
متن کاملRelational Semantics for Flow Graph Representations as Basis for Transformational Design of Digital Systems
Transformational design is a promising design methodology which combines correctness by construction and interactive design. In this design methodology the design steps are behaviour preserving transformations of one design representation into another. The representations used in transformational design need to have formal semantical models in order to prove the correctness, the behaviour prese...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملMeaningfulness of Religious Language in the Light of Conceptual Metaphorical Use of Image Schema: A Cognitive Semantic Approach
According to modern religious studies, religions are rooted in certain metaphorical representations, so they are metaphorical in nature. This article aims to show, first, how conceptual metaphors employ image schemas to make our language meaningful, and then to assert that image-schematic structure of religious expressions, by which religious metaphors conceptualize abstract meanings, is the ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.09749 شماره
صفحات -
تاریخ انتشار 2017